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Abstract 

In this study, a novel machine learning algorithm is presented for disaggregation of satellite soil moisture (SM) 
based on self-regularized regressive models (SRRM) using high-resolution correlated information from auxiliary 
sources. It includes regularized clustering that assigns soft memberships to each pixel at fine-scale followed by a 
kernel regression that computes the value of the desired variable at all pixels. Coarse-scale remotely sensed SM 
were disaggregated from 10km to 1km using land cover, precipitation, land surface temperature, leaf area index, 
and in-situ observations of SM. This algorithm was evaluated using multi-scale synthetic observations in NC Florida 
for heterogeneous agricultural land covers. It was found that the root mean square error (RMSE) for 96% of the 
pixels was less than 0.02 m 3 /m 3 . The clusters generated represented the data well and reduced the RMSE by 
upto 40% during periods of high heterogeneity in land-cover and meteorological conditions. The Kullback Leibler 
divergence (KLD) between the true SM and the disaggregated estimates is close to 0, for both vegetated and baresoil 
landcovers. The disaggregated estimates were compared to those generated by the Principle of Relevant Information 
(PRI) method. The RMSE for the PRI disaggregated estimates is higher than the RMSE for the SRRM on each day 
of the season. The KLD of the disaggregated estimates generated by the SRRM is at least four orders of magnitude 
lower than those for the PRI disaggregated estimates, while the computational time needed was reduced by three 
times. The results indicate that the SRRM can be used for disaggregating SM with complex non-linear correlations 
on a grid with high accuracy. 
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I. Introduction 

SM is a key governing factor in surface and sub-surface hydrological and agricultural models as it regulates 
land-atmosphere interactions. It has also been recognized as an essential climate variable by the Global Climate 
Observing System [1]. Representational models of weather [2]-[4], crop growth [5], ecosystem and carbon cycle 
processes [6], [7], dust generation [8], trace gas fluxes [9], and agricultural drought [10], [11] require soil moisture 
data at a fine spatial resolution. Recent satellite missions, including the European Space Agency (ESA) Soil Moisture 
and Ocean Salinity (SMOS) and the National Aeronautics and Space Administration (NASA) Soil Moisture Active 
Passive (SMAP) missions [12], provide for SM retrievals at unprecedented spatial resolutions of tens of kilometres 
every 2-3 days, with worldwide coverage. However, models simulating physical processes for agricultural regions 
need SM at even finer scales of 1 km [11]. Disaggregation addresses this discrepancy in scales by generating local 
fine-resolution data from coarse-resolution data obtained from satellites. 

Most of the disaggregation techniques broadly fall into three approaches. The first approach is based on the 
assumption that spatial disaggregation follows a known hierarchical model such as fractal interpolation, power-law 
or temporal persistence across scales. Methods using this approach usually assume static vegetation and micro¬ 
meteorology for a given area, due to the difficulties associated with parametrizing weather and land cover data 
across temporal and spatial scales in such models. However, the static assumption in this approach introduces large 
errors in realistic applications. The second approach uses empirical models based on statistical and geo-statistical 
methods, such as regression, co-kriging and block kriging, and fractal interpolation. The third approach employs 
statistical models based on the Triangle Method [13]—[15] to extrapolate the dependant data within the hypothetical 
triangle formed by the observed data. The robustness of the statistical methods over heterogeneous vegetation and 
weather conditions remain mostly untested. Treating each pixel as a sample instead of using spatial information to 
regularize the disaggregation results in salt and pepper noise due to spatial auto-correlation [16]. Moreover, these 
approaches use second order metrics, which do not leverage all the information in the data that is necessary in a 
highly non-linear regression problem such as disaggregation [17]. 

A recently implemented disaggregation algorithm [18] based on the principle of relevant information (PRI) 
addresses the above inadequacies by utilizing the full probability density function of a set of training observations, 
rather than second order moments, to approximate a transformation function that relates micro-meteorological data 
recorded in a region to in-situ soil moisture (SM). It uses the transformation function to generate an initial set of SM 
values for the rest of the data set. The disaggregated SM is obtained by iterating between the coarse scale SM values 
and the initial SM values using an information theoretic cost function. The PRI method was compared to the widely 
used disaggregation algorithm based on a second order regression using the Triangle Method [14]. It was found to 
have lower disaggregation errors, especially for complex noise models added to the coarse resolution SM. Notably, 
the Kullback-Leibler distance between the true and disaggregated SM was 50% lower for the PRI method, compared 
to the Triangle Method. This is because methods based on the 2nd order Triangular or Quadrilateral regressions 
do not have separate steps for error-bias and error-variance controls and rely on the data being well-posed to 
achieve a balance between error-bias and error-variance. Although the PRI method results in low disaggregation 
errors, training a fully Bayesian transformation function is computationally intensive. Additionally, it requires a 
comprehensive training set for the initial estimate of the multi-dimensional PDF to converge. In this study, a self- 
regularized regressive model (SRRM) is used to disaggregate SM. It is expected to be less computationally intensive 
as it uses auxiliary features correlated to SM to perform clustering of pixels and subsequently trains a single model 
for each cluster. Furthermore, it requires fewer samples for training. 

The goal of this study is to develop and implement a novel machine learning algorithm to disaggregate coarse 
scale remotely sensed SM using auxiliary fine-scale data. The primary objectives of this study are to - 1) develop an 
algorithm to identify contiguous regions of similarity in gridded images and subsequently, for each region, use kernel 
regression to estimate a disaggregation model for each region; 2) implement this algorithm to estimate SM at 1 km 
using SM at 10 km and other spatially correlated variables in the region such as land surface temperature (LST), 
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leaf area index (LAI), land cover (LC) and precipitation (PPT); and 3) evaluate the SRRM-based methodology and 
compare it with the PRI method using a synthetic dataset. 

Section II describes the theoretical details of the disaggregation framework based on self-regularized regressive 
models and provides a brief description of the PRI algorithm for disaggregation. Section III illustrates the steps for 
the implementation of the SRRM and presents the disaggregation results for SM at 1 km and Section IV summarizes 
the important results, concludes the paper, and outlines the scope for future studies. 

II. Disaggregation Framework 

Disaggregation is an ill-conditioned problem that is limited physically by the convolution of the point spread 
function of the imaging system. This constrains the generation of fine-scale data from coarse-scale data. Additional 
spatially correlated information is needed to regularize the fine-scale estimates. Methods that use regression to bridge 
the difference in scales have to use regularization to address the multiplicity of solutions. The SRRM addresses this 
problem by using a clustering algorithm to create a number of regions of similarity which subsequently, are used 
in a kernel regression framework. This is described in more detail in the following sections. Using spatial regions 
or dynamic conglomerations of pixels to generate models instead of treating each pixel in a sample-based method 
also reduces the effect of spatial autocorrelation on the disaggregated estimates. 


A. Disaggregation Framework based on Self-Regularized Regressive Models (SRRM) 

In this study, contiguous regions are identified in multi-dimensional correlated data using clustering and subse¬ 
quently a regression model is trained for each cluster for disaggregation. The membership vector of every pixel 
to a region, and thus to a model, is soft and constrained to sum to one across the space of models. The models 
themselves are trained using a kernel regression based method. It is a novel way to account for correlated features 
using algorithms that require an IID (independence and identical distribution) assumption [16]. Figure a 1 shows 
a flow diagram of the algorithm for generating disaggregated estimates. The overall organization and the datasets 
involved is shown in Figure 2. The two steps of the algorithm include clustering and kernel regression, as follows. 

1) Information theoretic (IT) clustering based on the Cauchy-Schwarz Distance: Commonly used clustering 
methods, such as the K-Means [19], assume hyper-spherical or hyper-elliptical clusters [20]. With gridded remotely 
sensed data, prior assumptions about cluster shapes are not advisable and lead to noise in the clustering result, as 
shown in Figure 3(a). Instead, in the IT clustering method the generalized proximity regions are identified using 
a regularized variant of a clustering method based on information theory [20]. The clusters are constructed using 
the probability density functions (PDFs) of the data, resulting in clusters that are representative of the input data, 
as shown in Figure 3(b). For any two vectors x and y, the Cauchy-Schwarz inequality is, 


-log 


I <x,y > [ \ 
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> o 


(i) 


where < x, y > is the inner product of vectors x and y. For PDFs p(x ) and q(x ), the inner product is defined as, 
< p,q >= f p(x)q(x)dx over the support for the distributions p and q. Then, the Cauchy-Schwarz inequality in a 
metric space spanned by the PDF is, 


-k> g ( ) > 0 (2) 

\y f p 2 (x)dx f q 2 (x)dx J 

If p(x) is calculated using pixels lying in cluster C\ and q(x) is calculated using pixels lying in cluster C 2 , the 
maximum separation is obtained between clusters when the left-hand side of Equation 2, the Cauchy Schwarz 
distance ( Dqs ), is maximized. Since logarithm is a monotonically increasing function, only the argument of the 
logarithm in Dcs — —log Jcs(p,q) can be equivalently minimized using gradient descent based optimization. An 
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estimator Jcs °f Jcs(p ? o) can be constructed from data-samples and extended to the case of multiple clusters by 
using a membership vector. 
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where is a soft K-dimensional vector, where the k th element expresses the degree of membership to the k th 
cluster. K is the total number of clusters which has to be supplied as input. •) is derived from convolution 

of two Gaussian kernels, defined as G av ^( x ^ x j) = ex P ^ regularized version can be used as an 

objective function of clustering, 


JcS G ( m lT-, m N) 


\ Ejli Ef=i (1 ~ m^m,) (xj, XJ ) 

Vn fc =l Ei=l Ej = l m ik rn jkG a ^2('Ki^j) 


N K 

^E£ mjfclog (m ik ) 

2=1 /c = l 


(4) 


The second term of the objective function is an estimate of the Shannon Entropy of the membership vectors and 
serves to regularize the membership vectors such that the model selection is sufficiently sparse. Getting the correct 
membership vector then is equivalent to solving this constrained optimization problem: 

min mi ,...,m JV Jcf G ( m i,..., m jv) subject to mjl^xi - 1 = 0, j = l,...,N (5) 

where Irxi is a K x 1 vector whose elements are all one. Consider k = 1,..., K which corresponds 

to a form that can be optimized by using Lagrange multipliers. The Lagrangian can be expressed as, 

N 

L = ^C# G ( V 1, v 2 ,- • • > v iv) + E A *K T Vj - 1) (6) 
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The optimization problem Equation 6 amounts to adjusting vectors v*, i = 1,..., JV such that, 

9jgf G /' 9j^ g T amA T aJgf G 

9 v, \ 9 m, 9 v, / 9 m, ’ V ’ 


where T = diag(2, '2^/m ,^) is the magnitude normalizing factor. The memberships are forced to be 
positive by adding a constant of small magnitude, a ~ 0.05 to all elements of T. The Lagrange Multipliers then, 
after constructing the necessary Lagrange Function is given by 
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The updated vector for the next iteration is, 


1 d Jcs° 

2A i dv 


The square of the membership vectors are initialized as = |A/"(0; 7 2 I) |, where Af denotes the Gaussian distribution 
and 7 is a very small number. 

Stochastic Approximation of the Gradient and Computational Complexity: If Jcs is represented as y, then the 
gradient of J§s G can be calculated as: 
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and the gradients of U and V are defined as, 
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Kernel Annealing: The objective function in Equation 4 has local minima that can inhibit the performance of 
this algorithm. To ensure that the clustering solution is globally, and not just locally, minimum the kernel width 
is gradually decreased in this algorithm over the course of iterations. The initial value of the kernel is chosen 
according to the Silverman’s rule of thumb [21] given by 

<7SIL = (4iV- 1 (2 d + l)- 1 ) ^ (13) 

where d is the dimensionality of the data, N is the number of samples and = d~ x JT Yhxa anc * J2x tt the 

diagonal values of the sample covariance matrix. The lower value of the kernel size is set to ctlow = Thus 

the annealing rate is, 

^SIL - CTLOW 3 ctsil -V 

r — -=- (14) 

iVroT 4 A^tot 

2) Regularized Kernel Regression: A kernel based regression technique that uses a training set of pixels and 
fits a function to it, by minimizing the representational error, is used to generate the disaggregated estimates. 
Ridge regression [22] is a parametric regression technique that adds a scaled regularizing term to the cost function. 
This improves the stability of the regression as the added L 2 normed term in the cost function results in smaller 
eigenvalues. The cost function is, 

£ (w, x) = 1 - w T xj) 2 + ^iiw|| 2 (15) 

i 

The weights can be calculated by differentiating the error cost function with respect to the weights and setting 
it to zero. 


^ = 0 => W = ^ X iX T + /il j (^2 y*x^ (16) 

For computation in a Reproducing Kernel Hilbert Space (RKHS), then the inner-products can be replaced with a 
kernel evaluation. Let T~L be a Hilbert space with an inner-product metric < •, • >%. Then according to the representer 
theorem, a kernel function ^(x, y) exists on R N x M N such that < x, y >u— ^( x , y)- Now, if <f> : M N —>► M N is a 
mapping that transforms the feature vector in the original vector space to T~L, then the weights can de redefined as, 

w = ( / uI i? + <M> T )- 1 <f>y (17) 

Where D is the dimension of the feature space. The dimension of the feature space is not well-defined in many 
cases, so the weights can be rewritten using the identity, (A~ l + B T C~ l B) _1 B T C~ l = AB T (BAB T + C) -1 , 

w = $(/xIjv + $ T $) _1 y ( 18 ) 

The weight vector w can be calculated using a training set of observations where y is known. This can then be 
used to calculate the estimated value for a new data-point x', 

y = w T <f>(x') (19) 
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where K is the Gram matrix of inner products of all the training data points. This does not address the constant 
that must be present in the regression. To solve this problem, the feature vector is augmented by adding a constant 
feature 1 to all samples. 


Algorithm 1 Disaggregation using Self-Regularized Regressive Models 

Require: Initialize membership vectors, v z |AT(0;7 2 I)| and number of clusters, N for each day of the data-set. 
Ndays is the total number of days, 
for i = 0 to Ndays do 
Step 1: Clustering 
for i = 1 to 30 do 

Calculate Jq§ g an d accor dmg to Equation 4 and 10. 

Update and v+ according to Equation 8 and 9. 

end for 

Step 2: Kernel Regression 

Calculate w according to Equation 18 using the training set. 

Estimate the disaggregated observations, y for the test set using Equation 19. 

Run 10-fold cross-validation for the values of N and the cross-validation constants ^ and fi. 

end for 


3) Algorithm Summary and Computational Complexity: The SRRM disaggregation is summarized and shown 
in Algorithm 1. A ten-fold cross-validation was used to determine the number of clusters (N) and the kernel size 
for the clustering (pjj) and the regularization weight for the regression (/i). The performance of the algorithm was 
less sensitive to the kernel size for regression than the other parameters and was set to the standard deviation of y 
at coarse scale. 

The complexity of the Dqs based clustering algorithm is 0(N 2 ) for each iteration. For good convergence, 30 
iterations are needed. This is much lower than the dimensionality of the data-set and does not affect the complexity 
of the algorithm. To reduce the computational load, a stochastic sampling method is used. For this, the gradient is 
approximated by using M samples out of all N. The complexity then becomes 0{MN) (M « N) per iteration. 
M can be much lesser than N and the results are comparable to the original method, taking a fraction of the time. 
The average complexity of the ridge-regression method is 0(N 3 ) [23]. 


B. The PRI Framework 

The disaggregation methodology using PRI includes a transformation process to obtain a probabilistic relationship 
between the variable to be disaggregated, y, at 1 km using auxiliary information, X, at the same scale. A discrete 
formulation of the Bayes rule is used to estimate yinitial at fi ne resolution, as given in equation (20), wherein y£ rain 
is discretized into k classes, i G [1, k\, and x^ train is discretized into k 3 classes in i\ G [1, kj], where j indexes the 
individual variables that comprise X, m. 


P(y initial I '^'traW 


y initial = arg max 
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^(■^train)P(y train) 


p(x;;„ n ) 


P(X" 
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X Strain KainMy : 


train/ 
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( 20 ) 


In the second step, yinitial is merged with the observations at the coarser resolutions, y CO arse to obtain improved 
estimates at fine resolution, 


arg maxL(m) = H( m) + /3KL(p m \\p y . nm ^) 

m 


( 21 ) 
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where /(m) is the cost function, p yiNITIAL the PDF of the original data, and p m is the PDF at each iteration. 
/( m) is the entropy, and KL is the KL divergence, m is initialized to y coarse at the first iteration. The /? is a 
user-defined weighting parameter that balances the redundancy and information preservation in /(m). As the value 
of /? increases, the cost function gives more emphasis to KL, thus preserving more information about the data at the 
cost of extremely high redundancy reduction. In this study, an intermediate value of (3 = 2 was chosen so that the 
PRI-image would approximate the mean level of y at coarse scales but will also embed the level of detail provided 
by the initial estimates of y at 1 km, to obtain morphed estimates of y at 1 km. The computational complexity 
of the PRI algorithm is given as 0(N 3 ]\kj) where kj are the number of bins used to estimate the PDF of the 
features for the transformation function. A detailed description of the PRI algorithm can be found in [18]. 

III. Experimental Description and Results 

A. Multiscale synthetic dataset 

The proposed algorithm for disaggregation was tested using data generated by a simulation framework consisting 
of the Land Surface Process (LSP) model and the Decision Support System for Agrotechnology Transfer (DSSAT) 
model, described in [24]. A 50 x 50 km 2 region, equivalent to approximately 25 SMAP pixels at 9 km spatial 
resolution, was chosen in North Central Florida (see Figure 4) for the simulations. The region encompassed the 
UF/IFAS (University of Florida’s Institute of Food and Agricultural Sciences) Plant Science Research and Education 
Unit, Citra, FL, where a series of season-long field experiments, called the Microwave, Water and Energy Balance 
Experiments (MicroWEXs), have been conducted for various agricultural land covers over the last decade [25]—[27]. 
Simulated observations of LST & LAI were generated at 200 m for a period of one year, from January 1, 2007 
through December 31, 2007. Topographic features, such as slope, were not considered in this study because the 
region is typically characterized by flat and smooth terrains with no run-off due to soils with high sand content. 
The soil properties were assumed constant over the study region. 

Fifteen-minute observations of precipitation, relative humidity, air temperature, downwelling solar radiation, and 
wind speed were obtained from eight Florida Automated Weather Network (FAWN) stations [28] located within the 
study region (see Figure 4). The observations were spatially interpolated using splines to generate the meteorological 
forcings at 200 m. Long-wave radiation was estimated following Brutsaert [29]. 

The model simulations were performed over each agricultural field rather than all the pixels, to reduce computation 
time. Based upon land cover information at 200 m, contiguous, homogeneous regions of sweet-corn and cotton 
were identified, as shown in Figure 5. A realization of the LSP-DSSAT model was used to simulate LST, LAI, and 
PPT at the centroid of each homogeneous region, using the corresponding crop module within DSSAT. The model 
simulations were performed using the 200 m forcings at the centroid, as shown in Figure 5. Linear averaging is 
typically sufficient to illustrate the effects of resolution degradation [30]. The model simulations at 200 m were 
spatially averaged to obtain PPT, LST, LAI, SM, and T B at 1 and 10 km. The SM obtained at 1 km was divided 
into the training and test sets that were used as truth to evaluate the disaggregation methodology and serve as 
simulated “ in-situ ” measurements to train the algorithm respectively. PPT, LAI and LST are typically chosen due 
to their high correlations with SM [14], [18], [31]. Other geophysical descriptors such as slope and soil texture 
were not used in this study because of their limited utility in a flat and primarily sandy region, such as that in 
North Central Florida. To simulate rainfed systems, all the water input from both precipitation and irrigation were 
combined together, and the “PPT” in this study represents these combined values. 

B. Disaggregation Framework based on SRRM 

The simulation period, from Jan 1 (DoY 1) to Dec 31 (DoY 365), 2007, consisted of two growing seasons of 
sweet corn and one season of cotton, as shown in Table a I. The LST, PPT, and LAI observations at 1 km were 
obtained by adding white Gaussian noise to account for satellite observation errors, instrument measurement errors, 
and micro-meteorological variability, following [32]—[34]. Errors with zero mean and standard deviations of 5K, 1 
mm/hour, 0.03 m 3 /m 3 and 0.1 for LST, PPT, SM and LAI, respectively, were added to the values at 10 km. 

The SRRM uses LST, 3-day PPT, LAI, LC at 1 km and SM at 10 km every 3 days as input. In the first step, the 
information-theoretic cost function described in Section II-A is used for clustering using the inputs at 1 km and 
the x and y coordinates of each pixel scaled to a range of 0 and 1. This step of the algorithm uses two parameters 
- the number of clusters, N and a regularization constant, i±. Both the number of clusters and the regularization 



constant is determined by cross-validating against the absolute mean error in SM at the end of the second step for 
each day. 

The optimal number of iterations that produced a usable clustering result was determined by the minimum 
root mean square error (RMSE) for a day when both the land cover and micro-meteorological conditions were 
heterogeneous, DoY 222, providing the worst case-scenario for convergence of the clustering algorithm. At the end 
of this step, each pixel has a vector of N numbers, (mi, m 2 ,..., tun) that sum to 1 describing its membership to 
each of the N clusters. Figure 6 shows the spatially averaged RMSE between disaggregated SM and the observations 
at 1 km on DoY 222 for different iterations of the clustering algorithms. All parameters, except the number of 
clusters, were cross-validated for each individual iteration.The number of clusters were cross-validated once, using 
50 iterations of the clustering algorithm. For the cross validation, the training set was randomly divided into 10 
equal parts. Nine parts were used for training and one part was used for evaluating the algorithm. This methodology, 
known as 10-fold cross-validation, is repeated ten times with different randomly selected partitions to approximate 
the average errors that the SRRM would incur. The error oscillates with a mean amplitude of 1.2 x 10 -4 m 3 /m 3 
after 30 iterations. In this study, 30 iterations of the clustering algorithm are used. 

In the second step, N models, / 1 ,/2, • • •,/jv are developed using LST, 3-day PPT, LAI, LC, SM at 1 km 
and SM at 10 km as inputs to the regularized kernel regression algorithm described in Section II-A, using 
training set. The training set was consisted of randomly selected 33% of the pixels or 500 out of the 2500 
pixels that make up the region. The remaining pixels were used as the test set. The hard membership of each 
pixel, z, for model development purposes is determined by the maximum value in its membership vector, m z = 
( 7774 , 777 , 2 , • • • ? m V)- The disaggregated value of SM is computed for each point in the test, represented as a vector, 
x'i = (LST‘ km , PPT! km , LAlJ km , LCj km , SMj okm ) by, 

SMf m = m T • (7i(x'j), / 2 (x'j),..., /jv(x'i)^ (22) 

The SRRM is evaluated using the RMSE and standard deviation of the errors over the entire season. The RMSE 
over for the entire time-period is assessed for each land-cover. Moreover, the disaggregated SM is compared with 
the true SM. To evaluate how close the density function of the disaggregated estimates is to the density function of 
the true SM, the Kulback Liebler-Divergence (KLD) between the density of the estimated observations and the true 
SM is calculated for different LC’s over the season. The KLD is a member of the class of well known f-divergences 
that convey distances in probability space. Any other f-divergence like the Hellinger distance or x 2 -distance can 
also be used. A sensitivity analysis was conducted to determine how each auxiliary variable separately contributed 
to errors in downscaled soil moisture (SM). A single auxiliary variable, LST, LAI or PPT, was allowed to vary for 
each land cover, while the others were set to their mean values. The relative root mean square errors ( RMSE r ), 
RMSE r = , were investigated, where A RMSE is the change in RMSE when a single auxiliary variable 

is used compared to when all the auxiliary variables are used ( RMSE org ) for each day in 2007. The daily RMSE r 
averaged over each LC, baresoil, corn and cotton, in 2007 is also studied. 

In addition, 5 days were selected from the season to understand the effect of the heterogeneity in inputs on the 
error in disaggregated SM. Variabilities in precipitation, ranging from uniformly wet to uniformly dry, and in land 
cover, ranging from bare soil to vegetated with both cotton and sweetcom, were used as criteria for selecting the 
days, as shown in Table II. Quantitative analyses of spatial variations in SM observed under dynamic vegetation 
and heterogeneous land cover conditions provide an index of dynamic errors that can be expected. The utility 
of using multiple models in the region, i.e. one model for each cluster, was also investigated by comparing the 
disaggregation results to when the entire dataset is considered as a single cluster and only one model is used for 
disaggregation, on DoY 222 of the study. 

The spatially averaged RMSE for each DoY in the simulation period is shown in Figure 7. A Z-test was performed 
to evaluate whether the disaggregated SM at 1 km is within a standard deviation of ±0.04 m 3 /m 3 from the true 
SM at 1 km, for meaningful use in hydrological models [35]. This null hypothesis was found to be true for every 
day of the simulation period. Figure 8 shows the cumulative density function (CDF) of the errors in disaggregated 
SM. About 98% of the days have an RMSE of less than 0.02 m 3 /m 3 in the disaggregated SM. Figure 9 shows 
the disaggregated SM versus true SM at 1 km. The algorithm does not introduce any bias and the data points are 
scattered around the y — y — 0 line, with a positive variance. Most of the points for sweet-corn pixels and all of the 
points for cotton lie within 0.04 m 3 /m 3 . Figure 10(a) shows the errors for each DoY segregated by type of LC. 
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Baresoil pixels during periods of vegetation have the highest RMSE. This is due to sub-pixel vegetation at 250 m 
within a pixel classified as a baresoil pixel, when the vegetation fraction is < 0.5 at 1 km. Table III shows the 
KLD between the densities of the disaggregated estimates and the true SM. Baresoil pixels at 1 km without any 
vegetation at 250 m have the lowest KLD. Baresoil pixels at the end of the season, that are affected by remnant 
crops and baresoil pixels at 1 km with partial vegetation cover at 250 m, have a higher KLD, but very close to 0. 
Vegetated pixels at 1 km contain a higher KLD as well. The boundary pixels classified as bare-soil have vegetation 
at the 250 m scale contributing to these errors. 

Among the three scenarios considered for the sensitivity analysis, RMSEs in downscaled SM are the lowest when 
just LST is used for disaggregation. This suggests that SM is more strongly coupled to LST than LAI or PPT. 
This is expected since the spatial patterns apparent in LST images also appear in the SM image, especially baresoil 
pixels, as shown in Ligure 11(b), with a weaker and more complex relationship in corn and cotton, as shown in 
Ligures 11(c) and (d) respectively. LAI shows higher and similar effects on errors in disaggregated SM for during 
the mid and late growing seasons of corn and cotton crops. The use of PPT to disaggregate SM results in a lower 
RMSEs immediately following a major rainfall event. At other times, its sensitivity to SM is comparable to LST, 
for baresoil pixels, and to LAI, for vegetated pixels. 

For the five selected days, the inputs, clustering results, the first SM estimate, and PRI disaggregated SM are 
shown in Figures 12-15. The clustering results indicate that the implicit inclusion of spatial coordinate information 
adequately constrains the clusters from becoming too small, while the LST, LAI, PPT and LC ensure that the clusters 
are simultaneously representative of the land-cover and meteorological conditions in the region. When fields are 
significantly smaller than the resolution of auxiliary variables, for example, in developing countries, the implicit 
inclusion of coordinates might not result in a clustering that accurately follows field boundaries, although it would 
still separate out regions with different meteorological conditions. This would reduce the accuracy of disaggregated 
SM at the field edges may be reduced and post-processing based on finer scale land-cover will be needed in such 
scenarios. Both DoY 39, shown in Figure 12 and DoY 354, shown in Figure 13 are during bare soil land cover 
before and after the growing seasons, respectively. The disaggregated estimates for both days are very close to 
the true SM at 1 km, but due to crop residue and slightly heterogeneous precipitation in the region (Figure 13b), 
the error for DoY 354 is higher than for DoY 39. It was found that heterogeneity in any one input, is enough to 
capture vegetation patterns in the disaggregated estimate using Kernel regressive models as shown in Figures 13a, 
and 14a, for corn and cotton, when the LST is fairly uniform across the region, while PPT is heterogeneous due 
to precipitation patterns. On DoY 222, even when there was maximum heterogeneity in LC with corn, cotton, and 
bare soil, the error in SM is minimal as shown in Figure 16. The effects of noise amplitude in the coarse scale 
SM on the disaggregated SM were also investigated on DoY 222. Independent Gaussian noise with zero mean and 
standard deviations ranging from 0 to 0.1 m 3 /m 3 was added to the coarse-scale SM and the spatially averaged 
unbiased RMSE in disaggregated SM is shown in Figure 17. The errors grow sub-linearly, i.e. with a slope lower 
than 1, while the uncertainty in SM is < 0.06 m 3 /m 3 . When the uncertainty in coarse SM is >0.06 m 3 /m 3 the 
errors grow with a slope of 1.14 showing that the errors in the disaggregated SM have a higher magnitude than 
the uncertainties added to coarse SM. 

The novelty and efficacy of this disaggregation algorithm lies in the utilization of multiple models using clusters. 
TAs evident in Figure 18(a), a regression model based on a single cluster fails to fit the coarse SM and auxiliary 
data with a sufficient degree of accuracy, resulting in speckle noise in the disaggregated soil moisture. Instead, 
Figure 18(b) shows that using multiple cluster-based models is an elegant solution that adequately fits the coarse 
SM and the auxiliary data, and provides disaggregated estimates of SM with low RMSE. 

C. Comparison between SRRM & PRI 

The PRI method uses LST, 3-day PPT, LAI, LC, and SM at 1 km every 3 days as input to obtain the first estimate 
of the SM. To disaggregate SM, in Equation 20, X is set to {LST, PPT, LAI, LC} and ytrain is set to {SM^^}. 
In this study, 33% of the data set, selected randomly, is used for training the parametric Bayesian model. For the 
second step, in Equation 21, the SM observations at 10 km are set as y CO arse and first estimates of SM at 1 km 
from the transformation function are set as yinitial- The value of m after the cost function I(m) is minimized is 
the disaggregated SM estimates. 

The disaggregated estimates using the SRRM were compared with the PRI estimates using the RMSE and the 
KLD of the estimated densities of the disaggregated observations. The RMSE over for the entire time-period is 
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assessed for each land-cover using the SRRM and PRI algorithms are compared. The spatial errors are also compared 
for the selected five days during the simulation period, representing different micro-meteorological and land cover 
conditions. Finally, the running time of the SRRM and PRI algorithm are compared to understand the effects of 
the difference in algorithm complexity of the two algorithms. 

Figure 7 shows that the RMSE of the disaggregated observations using the self-regularized regressive models 
was less than the RMSE using the PRI algorithm. The trends observed when the PRI algorithm is used, such 
as higher errors during periods of vegetation, are preserved when SRRM is used. However, variations in the 
difference observed between the SRRM and PRI based SM, can be explained by its different correlations to LC 
and micro-meteorological conditions. The use of separate models in the SRRM enables low RMSEs even under 
highly heterogeneous LC. In contrast, the RMSEs increase by a larger magnitude during heterogeneous LC periods 
for the PRI algorithm because it uses a single disaggregation model for the whole study region. 

Table III compares the KLD between the disaggregated estimates generated by the SRRM and PRI algorithms, 
and true SM at 1 km. The general trends of KLD over different LC conditions followed by the SRRM are similar 
to those observed for the PRI. However, the errors for each LC are individually lower for the SRRM compared to 
the PRI method, as shown in Figure 10. This is further validated by the KLD for the SRRM estimates that is 3 
orders of magnitude less than for PRI estimates. 

Figures 12-15 compare the disaggregated SM estimates using the SRRM to those using the PRI method. The 
estimates using PRI does not have sharply defined regions, unlike those observed in the disaggregated SM using 
SRRM. The sharpness of disaggregated result could arise either from noise or spatial discontinuities in the inputs due 
to physical discontinuities in meteorological or land-cover conditions. Any disaggregation algorithm must maintain 
the latter, while suppressing the former. The equation 21, with f3 = 2 maximizes a cost-function that blurs the 
disaggregated SM so that the median error over all pixels is minimized at the cost of a greater variance in error. 
In the SRRM, the use of multiple models based on clusters ensures that the spatial discontinuity is maintained in 
the disaggregated SM when it is caused by a physical discontinuity. If the discontinuity originates from additive 
noise, under certain assumptions, the kernel regression suppresses the discontinuity in the disaggregated SM. The 
assumptions are that the noise is spatially un-correlated and has a wide probability density function. The results in 
[36] show that both assumptions are reasonable. 

The average execution time of the PRI was about 1.56 hours/disaggregation day and that of the SRRM was about 
32 minutes/disaggregation day. This is expected because the complexity of PRI is 0(N 3 Y[kj) where kj are the 
number of bins used to estimate the PDF. For an adequate estimate of the PDFs, Y\kj ~ N and the complexity 
approaches 0(7V 4 ), that is an order of magnitude higher than the complexity of the SRRM based algorithm. 

Thus, the SRRM achieves low mean errors using a non-linear regression and low error variances using multiple 
regressive models with soft boundaries. This ensures that sharpness is maintained along with low RMSEs. Given 
exhaustive training data, the PRI algorithm will have similar performance as the SRRM as shown in [18]. However, 
for operational use of the methodologies at field-scale in regions with highly varied LC or micro meteorology 
with low volume of training data. SRRM provides sharp images of disaggregated SM with a faster run time, less 
complexity, and lower RMSEs compared to the PRI. 

IV. Conclusion 

In this study, a disaggregation methodology based upon SRRM was developed, implemented and evaluated 
that preserves the high variability in SM due to heterogeneous meteorological and vegetation conditions. The 
SRRM preserves heterogeneity by utilizing a clustering algorithm to create a number of regions of similarity which 
subsequently, are used in a kernel regression framework. The clusters were computed using RS products, viz . PPT, 
LST, LAI, and LC. The kernel regression was implemented on the clusters using in-situ SM. 96% of the pixels 
across the whole season were found to have a disaggregation error of less than 0.02 m 3 /m 3 . The KLD values for 
disaggregated SM at 1 km for the SRRM was equal to 0, for all land covers. In contrast, the PRI method has KLD 
values that are several orders of magnitude higher, and has an threefold higher execution time. The averaged spatial 
error is also markedly lower for the SRRM compared to the PRI method. 

It is envisioned that the SRRM will be implemented and evaluated in this study may be applied using satellite 
images. For example, the PPT data may be obtained from the Global Precipitation Measurement missions and the 
LAI, LST and LC products are available from the MODIS sensor aboard Aqua and Terra satellites. 
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TABLE I 

Planting and harvest dates for sweet corn and cotton during the 2007 growing season 


Crop 

Planting DoY 

Harvest DoY 

Sweet Corn 

61 

139 


183 

261 

Cotton 

153 

332 
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TABLE II 

Days selected for evaluating SRRM estimates. These days capture variability in precipitation/irrigation (PPT) and 

LAND COVER (LC) 


DoY 

PPT 

LC 

39 

Dry 

Bare 

135 

Dry, Irrigated 

Sweet Corn 

156 

Wet 

Cotton 

222 

Dry, Irrigated 

Sweet Corn and Cotton 

354 

Wet 

Bare 
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TABLE III 

KL DIVERGENCE OVER THE 50x50 KM 2 REGION FOR THE DISAGGREGATED ESTIMATES OF SM OBTAINED AT 1 KM USING THE SRRM 

AND PRI METHOD. 


Land Cover 

KLDsrrm 

KLDpri 

Corn 

1.8615 x 10“ 17 

0.0234 

Cotton 

2.4828 x 10“ 04 

0.0283 

Baresoil b 

5.6222 x 10“ 5 

0.1036 

Baresoil c 

5.628 x 10“ 6 

0.0120 

Baresoil d 

2.5948 x 10“ 6 

0.0114 


b Baresoil pixels with vegetated sub-pixels at 250 m till DoY 332 
c Baresoil pixels after DoY 332 

d Baresoil pixels without any vegetated sub-pixels at 250 m till DoY 332 
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Fig. 1. Flowchart of the SRRM based algorithm. 
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Fig. 2. Flow diagram of the Self-regularized Kernel Regression models. 
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Fig. 3. 


Clustering result obtained from (a) DCS based clustering algorithm, and (b) K-Means clustering algorithm. 
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Fig. 4. 


Study region in North Central Florida. LSP-DSSAT-MB simulations were performed over the shaded 50 x 50 km 2 region. 
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Fig. 5. (a) Land cover at 200m during cotton and corn seasons. White, gray, and black shades represent baresoil, cotton, and sweet-corn 

regions, respectively. Homogeneous crop fields along with centers for (b) sweet-corn and (c) cotton. 
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Fig. 6. Root mean Square error in disaggregated soil moisture at 1 km versus number of iterations of the Dcs clustering algorithm. 
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Fig. 7. Spatially averaged root mean square error in disaggregated Soil Moisture at 1 km for each day of the year in the simulation period 
using the SRRM and PRI method. 
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Fig. 8. Cumulative density function of the errors in soil moisture. 
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Fig. 9. Disaggregated Soil Moisture vs. True Soil Moisture at 1 km during the whole season for (a)baresoil pixels (b)com pixels, and 
(c)cotton pixels. Lines corresponding to 4% soil-moisture are shown for each plot. 
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Fig. 10. Spatially averaged root mean square error in disaggregated Soil Moisture at 1 km for each day of the year in the simulation period 
for baresoil, corn and cotton landcovers using the (a) SRRM and (b) PRI method. 
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(a) 
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Fig. 11. (a) Relative change in RMSE (RMSE r ) when only LST, LAI or PPT is used as input for disaggregation in (a) the whole region, 

(b) bare soil, (c) corn, and (d) cotton for each day of 2007. 
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Fig. 12. DoY 39 - (a) LAI at 1 km, (b) PPT at 1 km, (c) LC at 1 km (yellow represents baresoil), (d) true SM at 1 km, (e) LST at 1 km, 
(f) clustering result at 1 km, (g) SM observations at 10 km, (h) disaggregated SM using SRRM, (i) disaggregated SM using PRI method. 
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Fig. 13. DoY 354 - (a) LAI at 1 km, (b) PPT at 1 km, (c) LC at 1 km (yellow represents baresoil), (d) true SM at 1 km, (e) LST at 1 km, 
(f) clustering result at 1 km, (g) SM observations at 10 km, (h) disaggregated SM using SRRM, (i) disaggregated SM using PRI method. 
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Fig. 14. DoY 135 - (a) LAI at 1 km, (b) PPT at 1 km, (c) LC at 1 km (yellow represents baresoil, blue represents corn), (d) true SM at 
1 km, (e) LST at 1 km, (f) clustering result at 1 km, (g) SM observations at 10 km, (h) disaggregated SM using SRRM, (i) disaggregated 
SM using PRI method. 




















Latitude Latitude Latitude 


32 



- 82.4 - 82.2 -82 

Longitude 

(d) 


- 82.4 - 82.2 -82 

Longitude 


- 82.4 - 82.2 -82 

Longitude 



I 0.1 

- 82.4 - 82.2 -82 m 3 / m 3 

Longitude 

(g) 


0.16 «> 29 ' 6 

3 20 R 
0.14 ."S ^ ° 

^ 29 4 
0.12 J 

29.3 


- 82.4 - 82.2 -82 

Longitude 

GO 



a> 


29.6 


5 29.5 


a3 


29.4 


29.3 


- 82.4 - 82.2 -82 

Longitude 


- 82.4 - 82.2 -82 

Longitude 



« 29-6 

295 1 29.5 
2 29.4 



- 82.4 - 82.2 -82 

Longitude 

(i) 



* 29.6 
B 29.5 
2 29.4 
29.3 


£ 


- 82.4 - 82.2 -82 

Longitude 


Fig. 15. DoY 156 - (a) LAI at 1 km, (b) PPT at 1 km, (c) LC at 1 km (yellow represents baresoil, green represents cotton), (d) true SM at 
1 km, (e) LST at 1 km, (f) clustering result at 1 km, (g) SM observations at 10 km, (h) disaggregated SM using SRRM, (i) disaggregated 
SM using PRI method. 
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Fig. 16. DoY 222 - (a) LAI at 1 km, (b) PPT at 1 km, (c) LC at 1 km (yellow represents baresoil, blue represents corn and green represents 
cotton), (d) true SM at 1 km, (e) LST at 1 km, (f) clustering result at 1 km, (g) SM observations at 10 km, (h) disaggregated SM using 
SRRM method, (i) disaggregated SM using PRI method. 
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Fig. 17. Standard deviation of noise added to coarse scale SM vs. unbiased RMSE in disaggregated SM 
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Fig. 18. SM at 10 km, true SM at 1km and disaggregated SM at 1km using (a) a single cluster for the study region, and (b) multiple 
clusters following the SRRM algorithm. 

























